%load_ext watermark
%watermark -v -a "author: eli knaap" -d -u -p segregation,libpysal,geopandas,geosnap
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Author: author: eli knaap Last updated: 2022-06-20 Python implementation: CPython Python version : 3.9.13 IPython version : 8.4.0 segregation: 2.3.1 libpysal : 4.6.2 geopandas : 0.10.2 geosnap : 0.10.0
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import hvplot.pandas
from segregation import singlegroup, multigroup, dynamics, batch
from geosnap import datasets, Community
from geosnap.analyze import segdyn
from geosnap.visualize import plot_timeseries
dc = gpd.read_parquet("data/dc_income.parquet")
dc.head()
| geoid | geometry | very_low_inc | low_inc | med_inc | high_inc | very_high_inc | share_very_low_inc | share_low_inc | share_med_inc | share_high_inc | share_very_high_inc | total | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 110010010022 | MULTIPOLYGON (((319901.558 4312052.717, 319906... | 28.0 | 112.0 | 92.0 | 261.0 | 282.0 | 0.036129 | 0.144516 | 0.118710 | 0.336774 | 0.363871 | 775.0 | 2012 |
| 1 | 110010010022 | MULTIPOLYGON (((319901.558 4312052.717, 319906... | 36.0 | 134.0 | 58.0 | 204.0 | 288.0 | 0.050000 | 0.186111 | 0.080556 | 0.283333 | 0.400000 | 720.0 | 2013 |
| 2 | 110010010022 | MULTIPOLYGON (((319901.558 4312052.717, 319906... | 23.0 | 186.0 | 65.0 | 166.0 | 355.0 | 0.028931 | 0.233962 | 0.081761 | 0.208805 | 0.446541 | 795.0 | 2014 |
| 3 | 110010010022 | MULTIPOLYGON (((319901.558 4312052.717, 319906... | 22.0 | 223.0 | 75.0 | 226.0 | 299.0 | 0.026036 | 0.263905 | 0.088757 | 0.267456 | 0.353846 | 845.0 | 2015 |
| 4 | 110010010022 | MULTIPOLYGON (((319901.558 4312052.717, 319906... | 21.0 | 200.0 | 51.0 | 190.0 | 424.0 | 0.023702 | 0.225734 | 0.057562 | 0.214447 | 0.478555 | 886.0 | 2016 |
from IPython.display import IFrame
plot_timeseries(dc, 'share_very_high_inc', nrows=2, ncols=4, figsize=(18,10), cmap='Blues', alpha=0.8)
SubplotGrid(nrows=2, ncols=4, length=8)
cols = ['very_low_inc', 'low_inc', 'med_inc', 'high_inc', 'very_high_inc']
multi_by_time = segdyn.multigroup_tempdyn(dc, cols)
multi_by_time
| year | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 |
|---|---|---|---|---|---|---|---|
| Name | |||||||
| GlobalDistortion | 38.7812 | 38.2548 | 36.9974 | 35.8396 | 34.4924 | 32.7230 | 31.2007 |
| MultiDissim | 0.3231 | 0.3206 | 0.3180 | 0.3163 | 0.3136 | 0.3127 | 0.3142 |
| MultiDivergence | 0.2097 | 0.2056 | 0.2008 | 0.1981 | 0.1923 | 0.1887 | 0.1858 |
| MultiDiversity | 1.5325 | 1.5286 | 1.5245 | 1.5219 | 1.5155 | 1.5006 | 1.4819 |
| MultiGini | 0.4465 | 0.4438 | 0.4402 | 0.4378 | 0.4336 | 0.4327 | 0.4345 |
| MultiInfoTheory | 0.1368 | 0.1345 | 0.1317 | 0.1302 | 0.1269 | 0.1257 | 0.1253 |
| MultiNormExposure | 0.1182 | 0.1175 | 0.1161 | 0.1153 | 0.1138 | 0.1145 | 0.1168 |
| MultiRelativeDiversity | 0.1136 | 0.1125 | 0.1107 | 0.1098 | 0.1079 | 0.1077 | 0.1085 |
| MultiSquaredCoefVar | 0.1038 | 0.1019 | 0.0991 | 0.0982 | 0.0956 | 0.0942 | 0.0924 |
| SimpsonsConcentration | 0.2320 | 0.2340 | 0.2359 | 0.2371 | 0.2402 | 0.2470 | 0.2559 |
| SimpsonsInteraction | 0.7680 | 0.7660 | 0.7641 | 0.7629 | 0.7598 | 0.7530 | 0.7441 |
multi_by_time.T.plot()
<AxesSubplot:xlabel='year'>
# removing the GlobalDistortion coef lets us see what's happening with the rest of the indices
multi_by_time.iloc[1:].T.plot()
<AxesSubplot:xlabel='year'>
Most indices are decreasing slightly over time
fig, axs = plt.subplots(1,2, figsize=(10,4))
multi_by_time.loc['MultiDissim'].plot(ax=axs[0])
multi_by_time.loc['MultiDissim'].plot(kind='bar', ax=axs[1])
fig.suptitle("Multigroup Dissimilarity")
Text(0.5, 0.98, 'Multigroup Dissimilarity')
One that isn't, is SimpsonsConcentration, which is increasing over time. Another index that bucks the trend is SimpsonsInteraction, which is decreasing over time (corresponding with an increse in segregation). The divergence between indices tells us that segregation may be changing in different ways across its different dimensions.
fig, axs = plt.subplots(1,2, figsize=(10,4))
multi_by_time.loc['SimpsonsConcentration'].plot(ax=axs[0])
multi_by_time.loc['SimpsonsConcentration'].plot(kind='bar', ax=axs[1])
fig.suptitle("Simpson's Concentration")
Text(0.5, 0.98, "Simpson's Concentration")
from geosnap.analyze.segdyn import singlegroup_tempdyn
singlegroup_tempdyn?
Signature: singlegroup_tempdyn( gdf, group_pop_var=None, total_pop_var=None, time_index='year', n_jobs=-1, backend='loky', **index_kwargs, ) Docstring: Batch compute singlegroup segregation indices for each time period in parallel. Parameters ---------- gdf : geopandas.GeoDataFrame geodataframe formatted as a long-form timeseries group_pop_var : str name of column on gdf containing population counts for the group of interest total_pop_var : str name of column on gdf containing total population counts for the unit time_index : str column on the dataframe that denotes unique time periods, by default "year" n_jobs : int, optional number of cores to use for computation. If -1, all available cores will be used, by default -1 backend : str, optional computation backend passed to joblib. One of {'multiprocessing', 'loky', 'threading'}, by default "loky" Returns ------- geopandas.GeoDataFrame dataframe with unique segregation indices as rows and estimates for each time period as columns File: ~/mambaforge/envs/pysal-workshop/lib/python3.9/site-packages/geosnap/analyze/segdyn.py Type: function
segs_single = segdyn.singlegroup_tempdyn(dc, group_pop_var='very_high_inc', total_pop_var='total', )
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. SpatialProximity: 100%|██████████| 27/27 [01:02<00:00, 2.30s/it] SpatialProximity: 100%|██████████| 27/27 [01:02<00:00, 2.32s/it] SpatialProximity: 100%|██████████| 27/27 [01:02<00:00, 2.32s/it] SpatialProximity: 100%|██████████| 27/27 [01:03<00:00, 2.34s/it] SpatialProximity: 100%|██████████| 27/27 [01:03<00:00, 2.35s/it] SpatialProximity: 100%|██████████| 27/27 [01:04<00:00, 2.38s/it] SpatialProximity: 100%|██████████| 27/27 [01:04<00:00, 2.40s/it]
segs_single
| year | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 |
|---|---|---|---|---|---|---|---|
| Name | |||||||
| AbsoluteCentralization | 0.6979 | 0.6973 | 0.6967 | 0.6972 | 0.6953 | 0.6923 | 0.6899 |
| AbsoluteClustering | 0.1258 | 0.1274 | 0.1308 | 0.1324 | 0.1346 | 0.1412 | 0.1504 |
| AbsoluteConcentration | 0.6550 | 0.6497 | 0.6414 | 0.6404 | 0.6265 | 0.6087 | 0.5940 |
| Atkinson | 0.2569 | 0.2521 | 0.2479 | 0.2445 | 0.2362 | 0.2297 | 0.2276 |
| BiasCorrectedDissim | 0.3977 | 0.3936 | 0.3909 | 0.3878 | 0.3840 | 0.3791 | 0.3779 |
| BoundarySpatialDissim | 0.2470 | 0.2424 | 0.2417 | 0.2385 | 0.2355 | 0.2288 | 0.2241 |
| ConProf | 0.2956 | 0.2944 | 0.2948 | 0.2920 | 0.2950 | 0.2988 | 0.3094 |
| CorrelationR | 0.2040 | 0.2020 | 0.2003 | 0.1980 | 0.1946 | 0.1929 | 0.1946 |
| Delta | 0.6851 | 0.6849 | 0.6837 | 0.6830 | 0.6801 | 0.6780 | 0.6773 |
| DensityCorrectedDissim | 0.3974 | 0.3934 | 0.3906 | 0.3876 | 0.3836 | 0.3786 | 0.3777 |
| Dissim | 0.3990 | 0.3950 | 0.3922 | 0.3892 | 0.3852 | 0.3802 | 0.3791 |
| DistanceDecayInteraction | 0.6068 | 0.6023 | 0.5974 | 0.5951 | 0.5881 | 0.5757 | 0.5578 |
| DistanceDecayIsolation | 0.4150 | 0.4202 | 0.4260 | 0.4279 | 0.4353 | 0.4493 | 0.4682 |
| Entropy | 0.1750 | 0.1724 | 0.1701 | 0.1679 | 0.1634 | 0.1600 | 0.1596 |
| Gini | 0.5451 | 0.5406 | 0.5370 | 0.5337 | 0.5267 | 0.5204 | 0.5187 |
| Interaction | 0.5316 | 0.5279 | 0.5240 | 0.5233 | 0.5185 | 0.5073 | 0.4906 |
| Isolation | 0.4684 | 0.4721 | 0.4760 | 0.4767 | 0.4815 | 0.4927 | 0.5094 |
| MinMax | 0.5704 | 0.5663 | 0.5634 | 0.5603 | 0.5561 | 0.5509 | 0.5498 |
| ModifiedDissim | 0.3777 | 0.3736 | 0.3708 | 0.3683 | 0.3646 | 0.3596 | 0.3584 |
| ModifiedGini | 0.5216 | 0.5170 | 0.5140 | 0.5106 | 0.5032 | 0.4971 | 0.4959 |
| PARDissim | 0.3819 | 0.3779 | 0.3753 | 0.3723 | 0.3684 | 0.3633 | 0.3618 |
| RelativeCentralization | 0.0909 | 0.0905 | 0.0931 | 0.0961 | 0.0959 | 0.0927 | 0.0914 |
| RelativeClustering | 0.1794 | 0.1757 | 0.1760 | 0.1746 | 0.1587 | 0.1421 | 0.1188 |
| RelativeConcentration | 0.0169 | 0.0183 | 0.0076 | 0.0135 | -0.0098 | -0.0262 | -0.0142 |
| SpatialDissim | 0.2398 | 0.2358 | 0.2347 | 0.2326 | 0.2294 | 0.2224 | 0.2169 |
| SpatialProxProf | 0.5464 | 0.5626 | 0.5715 | 0.5748 | 0.6046 | 0.6426 | 0.6858 |
| SpatialProximity | 1.1001 | 1.0988 | 1.0984 | 1.0978 | 1.0969 | 1.0960 | 1.0970 |
segs_single.T.hvplot(height=600)
IFrame('https://www.jstor.org/stable/2579183', height=600, width=800)
(segs_single.T[['Gini', 'Entropy', 'Dissim', 'Atkinson']].hvplot(title='Evenness Dimension', width=380, height=400).opts(legend_position='bottom', show_grid=True) +
segs_single.T[['AbsoluteConcentration', 'RelativeConcentration' , 'Delta']].hvplot(title='Concentration Dimension', width=380, height=400).opts(legend_position='bottom', show_grid=True) +
segs_single.T[['AbsoluteClustering', 'Isolation', 'CorrelationR', 'Interaction', 'SpatialProxProf']].hvplot(title='Exposure/Clustering Dimension', width=380, height=400).opts(legend_position='bottom', show_grid=True))
segs_single.T[['AbsoluteClustering', 'Isolation', 'SpatialProxProf', 'Interaction']].pct_change(periods=5) # we should only compare non-overlapping intervals
| Name | AbsoluteClustering | Isolation | SpatialProxProf | Interaction |
|---|---|---|---|---|
| year | ||||
| 2012 | NaN | NaN | NaN | NaN |
| 2013 | NaN | NaN | NaN | NaN |
| 2014 | NaN | NaN | NaN | NaN |
| 2015 | NaN | NaN | NaN | NaN |
| 2016 | NaN | NaN | NaN | NaN |
| 2017 | 0.122417 | 0.051879 | 0.176061 | -0.045711 |
| 2018 | 0.180534 | 0.079009 | 0.218983 | -0.070657 |
Between the sampling periods 2008-2012 and 2013-2017:
Between the sampling periods 2009-2013 and 2014-2018:
from segregation.singlegroup import Entropy
d = segdyn.spacetime_dyn(dc, singlegroup.Entropy, group_pop_var='very_high_inc', total_pop_var='total', distances=list(range(500,5500,500)))
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
d.plot(cmap='Reds')
<AxesSubplot:xlabel='distance'>
Entropy is falling the fastest at small scales (the gap is wider on the left-hand side of the graph than the right-hand side)
iso = segdyn.spacetime_dyn(dc, singlegroup.Isolation, group_pop_var='very_high_inc', total_pop_var='total', distances=list(range(500,5500,500)))
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
iso.plot(cmap='Reds')
<AxesSubplot:xlabel='distance'>
Isolation is growing the fastest at large scales (the gap is wider with larger distances on the right)
from geosnap.visualize import animate_timeseries
animate_timeseries(dc, 'share_very_high_inc', filename='figs/dc_high_inc_change.gif', fps=1.5)
from IPython.display import Image
Image('figs/dc_high_inc_change.gif',width=800)
<IPython.core.display.Image object>
The story in DC is one of increasing isolation by the affluent at large spatial scales.
By many metrics, the region appears to be becoming less segregated by income over time, but that increase in evenness is largely due to the entire region getting richer. By contrast, looking at the exposure dimension shows that the residents with the highest incomes spend increasingly less time in environments with other income groups, and that change is happening quickest at large spatial scales. Put differently, we see a trend akin to agglomeration, whereby the large wealthy enclaves are becoming moreso
(remember this is example makes some very liberal assumptions about the input data, so the "takeaways" here are just for illustration)
The Python dashboarding ecosystem is evolving quickly, so we won't opine on which platform or toolset is best. But if you have a personal favorite, geosnap is performant to power an urban analytics dashboard on-the-fly. The example below wraps a simple streamlit interface around the workflow above that lets us explore every metro region quickly
